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1 Introduction 

In this paper, we consider a bivariate process (Xt, Vt)t>o with dynamics described by 
the following equations: 



(dXt = VVtdB t , A =0, 

\ dV t = b{V t )dt + a{V t )dW t V =r), V t > 0, for all t > 0, 



(1) 



where (Bt, Wt)t>o 1S a standard bidimensional Brownian motion and r\ is independent 
of (Bt, Wt)t>o- Our aim is to propose and study nonparametric estimators of b(.) and 
<j 2 (.) on the basis of discrete time observations of the process X only. 
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Model (JT| was introduced by Hull and White (1987) under the name of Stochastic 
Volatility model. It is often adopted in finance to model stock prices, stock indexes 
or short term interest rates: see for instance Hull and White (1987), Anderson and 
Lund (1997), the review of Stochastic Volatility models in Ghysels et al. (1996) or the 
recent book by Shephard (2005) and the references therein. See also an econometric 
analysis of the subject in Barndorff-Nielsen and Shephard (2002). 

The approach to study model (TTJ is often parametric: the unknown functions are 
specified up to a few unknown parameters, see the popular examples of Heston (1993) 
or Cox, Ingersoll and Ross (1985). General statistical parametric approaches of the 
problem are studied in Genon-Catalot et al. (1999), Hoffmann (2002), Gloter (2007), 
Ai't-Sahalia and Kimmel (2007) . A nonparametric estimation of the stationary density 
of Vt is studied in Comte and Genon-Catalot (2006) . A recent proposal for nonparamet- 
ric estimation of the drift and diffusion coefficients of V can be found in Reno (2006), 
who studies the empirical performance of a Nadaraya- Watson kernel strategy on two 
parametric simulated examples. Our approach is new and different, and it is based on 
a nonparametric mean square strategy. We consider the same probabilistic and sam- 
pling settings as Gloter (2007) and follow the ideas developed in Comte et al. (2006, 
2007), where direct or integrated discrete observations of the process (Vt) are consid- 
ered. Here, our assumptions ensure that (Vt) is stationary and we consider discrete 
time observations (Xpg)i<£< n+ i of the process (Xt) in the so-called high frequency 
context: S is small, n is large and n8 = T, the time interval where observations are 
taken, is large. 

We assume that n = kN and define as it is usual, for i = 0, 1, . . . , N — 1, the realized 
quadratic variation associated with (Xpg) ik+1<t< ^ i+ i^ k : 

^ = k8 ( X (tk+ 3 + l)S - x (ik+j)s) ■ 
3=0 

Setting A = kS, V% provides an approximation of the integrated volatility: 

i A i+1 ) A 

% = - V s ds, (2) 

A J % A 

which in turn may be, for well chosen k,5, a satisfactory approximation of V^. We 
have in mind to obtain regression- type equations, for £ = 1,2: 

Y^f^ = (Pi) + noise + remainder, 

where 

; (D = b yd) = Vi+l-Vj and / (2) = ^2 (2) = 3(jl-|) 2 (3) 

A 1 2 A w 

Choosing a collection of finite dimensional spaces, we use the regression-type equations 
to construct estimators on these spaces. Then, we propose a data driven procedure to 
select a relevant estimation space in the collection. As it is usual with these methods, the 
risk of an estimator / of / = b or a 2 is measured via E(||/ — /|j^-) where ||/ — f\\ N = 

0-/N) E£c/(/ - ff( V i)- We obtain risk bounds which can be interpreted as n,N 
tend to infinity, S, A tend to and T = nS — N A tends to infinity. These bounds are 
compared with Hoffmann's (1999) minimax rates in the case of direct observations of 
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V. For what concerns b, our method leads to the best rate that can be expected. For 
what concerns a 2 , no benchmark is available in this asymptotic framework. Indeed, 
Gloter (2000) and Hoffmann (2002) only treat the case of observations within a fixed 
length time interval, in a parametric setting. As it is always the case, the rates are 
different for the two functions. 

The paper is organized as follows. Section 2 describes the assumptions on the model 
and the collection of estimation spaces. In Section 3, the estimators are defined and 
their risks are studied. Section 4 completes the procedure by the data driven selection 
of the estimation space. Examples of models and simulation results are presented in 
Section 5. Lastly, proofs are gathered in Section 6. 



2 The assumptions 

2.1 Model assumptions. 

Let (Xt, Vt)oo be given by ([TJ and assume that only discrete time observations of X, 
{Xis)i<t< n +l are available. We want to estimate the drift function b and the square 
of the diffusion coefficient a 2 when V is stationary and exponentially /3-mixing. We 
assume that the state space of (Vt) is a known open interval (tq, r\) of R + and consider 
the following set of assumptions. 

[Al ] < r < n < +oo, /= (r ,ri), with a(v) > 0, for all v € /. Let / = [r ,ri]nK. 
The function b belongs to C 1 (l), b' is bounded on I, a 2 £ C 2 (J), (u 2 )'cr is Lipschitz 
on I, (a 2 )" is bounded on / and <J 2 {v) < a\ for all v in I. 

[A2 ] For all vq,v 6 J, the scale density s(v) = exp |— 2 J^ g b(u)/a 2 (u)dit^ satisfies 

f ro s(x)dx = +oo = J Tl s(x)dx, and the speed density m(v) = l/(a 2 (v)s(v)) 
satisfies f Tl m(v)dv = M < +oo. 

' o 

[A3 ] rj ~ 7r and Vi,E(r; 21 ) < oo, where n(v)dv = {m{v)/M)\^ rQ ri ^(v)dv. 
[A4 ] The process (Vt) is exponentially /3-mixing, i.e., there exist constants K > 0, 8 > 0, 
such that, for all t > 0, f3 v {t) < Ke~ 9t . 

Under [A1]-[A3], (Vt) is strictly stationary with marginal distribution 7r, ergodic 
and /3-mixing, i.e. limt^+oo /3v(t) = 0. Here, /3y(t) denotes the /3-mixing coefficient of 
(Vt) and is given by 

Py{t) = I Tr(v)dv\\Pt(v, dv ) — 7t(v )dv \\tv- 

J r a 

The norm ||.||tv is the total variation norm and Pt denotes the transition probability of 
(Vt) (see Genon-Catalot et al. (2000)). To prove our main result, we need the stronger 
mixing condition [A4], which is satisfied in most standard examples. Under [A1]-[A4], 
for fixed A, (Vi)i>o i s a strictly stationary process. And we have: 

Proposition 2.1 Under [A1]-[A4], for fixed k and 8, (Vi)i>o * s strictly stationary and 
%(«) < cf3 v {iA) for alli>l. 
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2.2 Spaces of approximation 

The functions b and a 2 are estimated only on a compact subset A of the state space 

o 

7 . For simplicity and without loss of generality, we assume from now on that 

A = [0, 1], and we set bj\ = bl^, = oIa- (4) 

To estimate / = b, a 2 , we consider a family S m ,m G Mn of finite dimensional 
subspaces of L2([0, 1]) and compute a collection of estimators f m where for all m, f m 
belongs to S m - Afterwards, a data driven procedure chooses among the collection of 
estimators the final estimator f ln . 

We consider here simple projection spaces, namely trigonometric spaces, S m ,m G 
Mn- The space S m is linearly spanned in L2([0, 1]) by ip\, . . . ,p2m+l with fi(x) — 
l[0,l]( x )' Vji 00 ) = %/2 cos (27rja;)l[Q !l ] (x) for even j's and <Pj(x) = \/2 sin(27rja;)l[ 0)1 ] (x) 
for odd j's larger than 1. We have D m = 2m + 1 = dim(5 m ) < T> n and Mn = 
{1,3, . . . ,T> n }- The largest space in the collection has maximal dimension T> n , which 
is subject to constraints appearing later. 

Actually, the theory requires smooth bases and regular wavelet bases would also 
be adequate. 

In connection with the collection of spaces S m , we need an additional assumption 
on the marginal density of the stationary process (V^)j>o: 

[A5 ] The process (K)j>o admits a stationary density ir* and there exist two positive 
constants 7Tq and tt\ (independent of n, 5) such that Vm G Mn, Vt G S m , 

^||t|| 2 <E(i 2 (Vb))<^||t|| 2 - (5) 

The existence of the density n* is easy to obtain. The checking of (JS| is more 
technical. See the discussion on [A5] in Section f6. 2 1 Below, we use the notations: 

Htll 2 .* = / t 2 {x)ir*{x)dx, \\t\\ 2 — I t 2 (x)dx and ||i||oo = sup |£(a;)|. (6) 
J Jo xe[o,i] 



3 Mean squares estimators of the drift and volatility 

3.1 Regression equations 

Reminding of we first prove the developments, for I — 1, 2: 

Y^^f^m + Z^+R^Hi + l), (7) 

where the Z^''s are noise terms (with martingale properties) and the R^(i)'s are 
negligible residual terms given in Section [6] For the noise terms, we have, for 1 = 1 
(/W=6): 

* A? / ^A(u)v(Vu)dW u + (u i+hk - u hk )/A, 

with 

Aa{u) = {u- iA)\ [iA (i+1)A[ (u) + [(i + 2)A - u]\ [{l+1)A {l+2)A[ (u). (8) 
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and 



1 fe_1 
.* = A E 



(ik+j+l)S 
ik+j)8 



V s dB, 



(ik+j+l)S 



V s ds 



{ik+j)8 



3=0 

Note that Vi = Vi + Uj j~ . 

On the other hand, for I = 2 (/ (2) = <r z ), we have = Z^'^ + Z\^> + Z^'°> with 

2 



7(24) 



2Z\3 



(i+2)/A 



H 



(2) ^ „(2,1) ^ ^(2,2) 5,(2,3) 
(i+2)A 



iHA{s)o 2 (V s )ds 



z (2,2) = jW^) [ (t+2A <p iA (s)a(V s )dW s 



1A 



+ - 



(i+2)A ( r(i+2)A 



A 3 JiA 

where ip^ is given in ©, and 



i> 2 iA {u)du ) [(* 2 )'a}(V s )dW s , 



4 2 ' 3) = |(^+l-^)K+i,fe-^,fe) 



3.2 Mean squares contrast 

Equation j7} gives a natural regression equation to estimate In light of this, we 
consider the following contrast, for a function t £ S m where S m is a space of the 
collection and for £ = 1,2: 



7$ (*) = £EKSi- «<**>] 



1 JV-1 



TV 



i=0 



Then the estimators are defined as 



/,y = argmui 7j\r (*)- 



(9) 



(10) 



The minimization of ~/ N over S m usually leads to several solutions. In contrast, the ran- 
dom R^-vector (fm(Vo), . ■ . , fm (Kzv-l))' i s always uniquely defined. Indeed, let us 
denote by II m the orthogonal projection (with respect to the inner product of H N ) onto 
thesubspaceof R N , {(i(V" ), • . • ,t(P N ^))',t G S m }, then (f£\%), . . . , f${V N -l))' = 
n m Y {t} where F W = (Y^, . . . , Y$)' . This is the reason why we consider a properly 
defined risk for based on the design points, i.e. 

1 N-l 
1 ,i{t) 



N 



i=0 



Thus, the error is measured via the risk E(||/^ — /^Hat) where 



N-l 



\\t\\N 



N 



i=0 
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Let us mention that for a deterministic function E(||t||jy) = \\t\\„* = J t 2 (x)n* (x)dx. 
Moreover, under Assumption [A5], the norms ||.|| and are equivalent for functions 

in Sm (see notations ((6])). 

The following decomposition of the contrast holds: 

if it) - 7#>c/ w ) = ii* - / w n 2 * - 1 eV# - f w mu w - t)(vs) 

i=0 

In view of (0, we define the centered empirical processes, for ^=1,2: 



TO 



W v-/ N 

i=0 



and the residual process: 



<(t) = -^£t(Vtf* W (i + l)- 



iV 

i=0 

Then we obtain that 

7#(*) ~ 7$(/ W ) = 11^ - / W ||?V - *#(t - /W) - 2«W(t - /«). 

Let /to be the orthogonal projection of on SV n . Write simply that 7$(/m ) _i 
7w'(/m^) by definition of the estimator, and therefore that *f*r[fm) ~ 7w'(/®) — 
7^(/^)-7^(/ W )- This yields 

n/S? - /^ift < n/# - ^nSr + - /S?) + 2*<?c$> - 

The functions /^f an d /m being A-supported, we can cancel the terms ||/Ia c IIat that 
appears in both sides of the inequality. Therefore, we get 

\\Jm —Ja IIJV S ||/m —Ja Wn + (Jm — J m I + AH N Urn — Jm )■ l 11 ,) 

Taking expectations and finding upper bounds for 

E( sup [v { x\t)} 2 ) and E( sup [R ( £ \t) 2 ) 
teS m , ||i|| = l t6S OT ,||t||=l 

will give the rates for the risks of the estimators. 



3.3 Risk for the collection of drift estimators 

For the estimation of 6, we obtain the following result. 

Proposition 3.1 Assume that N A > 1 and 1/k < A. Assume that [A1]-[A5] hold and 
consider a model S m in the collection of models with T> n < 0(\/ N A/ ln(JV)) where T> n 
is the maximal dimension (see Section \2.ty) . Then the estimator f^} — b m of f^ — b 
is such that 

E(||S m - b A f n ) < 7\\b m - b A \\l. + K ^ 2 (Vo))D m + K , A (12) 
where bA = M[o i] an d K, K and K" are some positive constants. 
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Note that the condition on T> n implies that y/NA/ ln(iV) must be large enough. 

It follows from (|12p that it is natural to select the dimension D m that leads to the 
best compromise between the squared bias term \\b m — b A \\ 2 * (which decreases when 
D m increases) and the variance term of order D m /(NA). 

Now, let us consider the classical high frequency data setting: let A = A n , k — k n 
and N = N n be, in addition, such that A n -> 0, N = N n -> +00, N n A n / ln 2 (iV„) -> 
+00 when n — > +00 and that l/(fc n Z\ n ) < 1. Assume for instance that b A belongs 
to a ball of some Besov space, b A G B a ^,oo([0, 1]), a > 1, and that ||6 m — < 
7T*||6 m - &aI| 2 , then \\b A - b m \\ w * < C(a, L,nl)D m 2a , for \\b A \\ a ,2,oo <L (see Lemma 
12 in Barron et al. (1999)). Therefore, if we choose D m = {N n A n ) 1 ^ 2a+1) , we obtain 

n\\bm-b A \\l) < C(a,L)(N n A n r 2a/{2a+1) + K'A n . (13) 

The first term (A?"„zA„)~ 2q/(2q+1) = T ~ 2a /( 2a + 1 ) is t ^ e optimal nonparametric rate 
proved by Hoffmann (1999) for direct observation of V. 

Now, let us find conditions under which the last term is negligible. For instance, 
under the standard condition A n — 0(l/(N n A n )), the term A n is negligible with 
respect to (7V„z}„)~ 2a/(2a+1) . 

Now, consider the choices k n = 1/A n and S n = n~ c . Let us see if there are possible 
choices of c for which all our constraints are fulfilled. To have nS n — > +00 requires 
< c < 1. As A n = k n S n = 8 n /A n , we have A n = \Jon~ = nT c / 2 and N n = n/k n = 
n 1_c / 2 . Thus, A n — > and N n , N n A n —* +00. Finally, the last constraint to fulfill is 
that N n A 2 = n 1 " 3 ^ 2 = O(l). Thus for 2/3 < c < 1, the dominating term in Jl3j 
is (A rl Z\, i ) _2Q// ^ 2a+1 -' , ie. the minimax optimal rate. We have obtained a possible 
"bandwidth" of steps S n - 

3.4 Risk for the collection of volatility estimators 

For the collection of volatility estimators, we have the result 

Proposition 3.2 Assume that [A1]-[A5] hold and consider a model S m in the col- 
lection of models with maximal dimension T> n < 0(y~NA/ ln(iV)). Assume also that 
1/k < A and NA > 1, A < 1. Then the estimator f^) = a 2 ^ of = a 2 is such that 

HW&m - va\\n) < 7|km - <ta\\1* + K E{ai(V ° ))Dm + K 1 Res(D m , k, A), (14) 
where the residual term is given by 

Res(D m ,k,A) = D 2 m A 2 + D m A 3 + ^ + -^, (15) 

where a 2 A — er 2 l[o 1], and K , K 1 are some positive constants. 

The discussion on rates is much more tedious. Consider the asymptotic setting de- 
scribed for b. Assume that a 2 A belongs to a ball of some Besov space, a A £ S Q) 2,oo([0, 1]), 
and that \\a 2 n - o-\\\ 2 , < 7r*|km - a A \\ 2 , then \\a A - cr 2 ^ 2 , < C(a, L, irl)D m 2a , for 
ll°"y!lllct,2,oo ^ L. Therefore, if we choose D m = Nn /{2a+1) , and k n < 1/A n, we obtain 

- °a\\ 2 n) < C(a, L, 7vl)N- 2a/{2a+1) + K'Res(N^ 2a+1) , k n , A n ). (16) 



8 



The first term N n 2q /( 2q + 1 ) j s the optimal nonparametric rate proved by Hoffmann (1999) 
when N n discrete time observations of V are available. 

For the second term, let us set k n = n a , A n = n~ b , 8 n = n~ c , and recall that 
n8 n = N n A n and n/N n — k n , so that N n — n L ~ a and a + b = c. We look for a, b such 
that 

Res(N n ^ a+1 \k n ,A n )<N- 2a ^ a+1 \ 

For this, we take l/(k n A n ) = N~ 2a/(2a+1) which implies 2(o-6)/(l-o) = 2a/(2a+l). 
Wo get 

(2a + l)c + a , (3a + l)c-a 

a=- ; , b=- ; . 

5q + 2 5q + 2 

Then we impose N 2 ^ 2 ^ 1 ^ A„ < N n 2a /( 2a + 1 ) w hich is equivalent to 

2b > [(2a + 2)/(2a + 1)](1 - a) => c > (3a + 2)[2(2a + 1)]. 

Next N b n /{ - 2a+1) A n < iV„ 2q /( 2q + 1 ) l eac J s to 

3& > [(2a + 5)/(2a + 1)](1 - a) => c > (7a + 5)/(lla + 8). 

Lastly iV^ /(2a+1) An < jV- 2a /( 2a+1 ) holds for -2a < -[(3 + 2a) /(2a + 1)](1 - a), i.e. 
c> 2(a + 3)/(6a + 5). 

The optimal dimension has also to fulfill ivj^ 2 " -1-1 ^ < D„ < \/N n A n i.e. —[(2a — 
l)/[2(2a + 1)]](1 -a) < -6/2 which implies c < (5a - 2)/(5a). Finally, we must have 



c G 



_2(2a + l)' 5a 



5a -2" 






5a 







This interval is nonempty as soon as a > 2. 

In terms of the initial number n of observations, the rate is now (ji 1 ~ a )~ 2a ^ 2a+1 * > 
where 1 — a is at most 1/2, when a — ► +oo. This is consistent with Gloter's (2000) 
result: in the parametric case, he obtains n -1 / 2 instead of for the quadratic risk. 



4 Data driven estimator of the coefficients 

The second step is to ensure an automatic selection of D m , which does not use any 
knowledge on and in particular which does not require to know the regularity a. 
This selection is standardly done by setting 

mW = arg min ^(ftf) + pen^m)] , (17) 

with pen^ (m) a penalty to be properly chosen. We denote by = f^ t ) the resulting 
estimator and we need to determine pen such that, ideally, 

n\\f W - < Cjn^ (n/W - /<f)|| 2 + E(a ^!l Dm ) j + negligible terms, 

with C a constant which should not be too large. 



9 



4.1 Result for the data driven estimator of 6 
We almost reach this aim for the estimation of b. 

Theorem 4.1 Assume that [A1]-[A5] hold, 1/k < A, A < 1 and NA > 1. Consider 
the collection of models with maximal dimension T> n < 0(V NA/ln(N)). Then the 
estimator b = f^J^ of b where rrS 1 ^ is defined by Ji7| ) with 

(1) / x ^ 2 Dm n 

pen v >(m) > (!8) 
where k is a universal constant, is such that 

n\\b-b A \\ 2 n )<C inf (\\b m -b A \\l«+pen W (m)) 

+K[A+-^- + ^ r ^ V (19) 

V NA In 2 (N)k A J y ' 

For comments on the practical calibration of the penalty, see Section \5. 21 
It follows from (|19p that the adaptive estimator automatically realizes the bias- 
variance compromise, provided that the last terms can be neglected as discussed above. 
Here, the bandwidth for the choices of S n is slightly narrowed because of a stronger 
constraint. More precisely, we choose l/(k n A n ) = A n (instead of 1 previously), that is 
k n = A^ 2 , so that An = kndn = A„ 2 Sn 1 ■ Therefore A n = S^/ 3 and if S n = n~ c , then 
A n = n~ c ' 3 . Also, N n = n/kn = n^ 2c ' 3 , N n A n = n8 n = n 1 - , N n A 2 n = n 1 ^' 3 . 
Hence if 3/4 < c < 1, we have altogether: N n , N n A n / In 2 (N n ) tend to infinity with n, 
An, N n An tend to zero. 

In that case, whenever b A belongs to some Besov ball (see (JT3J ) , and if \\b m — 
^aIIx* — "il|&m ~&aI| 2 i then b achieves the optimal corresponding nonparametric rate. 
Note that, in the parametric framework, Gloter (2007) obtains an efficient estimation 
of b in the same asymptotic context. 



4.2 Result for the data driven estimator of the volatility 
We can prove the following Theorem. 

Theorem 4.2 Assume that [A1]-[A5] hold, 1/k < A, A < 1 and NA > 1. Con- 
sider the collection of models with maximal dimension D n < V NA/ ln(7V). Then the 
estimator a 2 — / - 2 ( 2 ) of a 2 where rfS 2 " 1 is defined by Ji7| ) with 

(2) i \ ^ 4 D m /nn\ 

pen^ >{m) > (20) 
where k is a universal constant, is such that 

®{\\°- 2 - <ta\\n) < C inf filer 2 , -o-i|| 2 . + pen (2) (m)) + C'tes(N,k,A), (21) 

m£M n V / 

where 

Res(N, k, A) = NA 3 + N 5/2 A 11/2 + + -J-*. (22) 
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Now, if a\ belongs to a ball of some Besov space, a\ 6 £> Qj 2,oo([0, 1]), then auto- 
matically, 

inf [\\a m - a A \\ir* +pen^ >(m)\ = 0{N n ') 
without requiring the knowledge of a. Therefore, 

HPI-vaWn) < C(a,L)N~ 2a/{2a+1) + C'tes(N n ,k n ,A n ). 

It remains to study the residual term. Notice that we do not know the optimal min- 
imax rate for estimating a 2 , under our set of assumptions on the models and on the 
asymptotic framework. However, Gloter (2000) and Hoffmann (2002), with observa- 
tions within a fixed length time interval, obtain the parametric rate n ' (in vari- 
ance). Taking this as a benchmark, we try to make the residual less than 0(n -1 ' 2 ). 
Let us set k n = n a , A n = n~ b , hence N n = n/k n = n 1_a and N n A n = n 1- ^ 6 '. 
This yields that 1 - a - 36, (5 - 5a - f lfc)/2, (3 - 7a - 36)/2, 2(6 - a) must all be less 
than or equal to —1/2, in association with a + b < 1 and N}J^ 2a+1 ^ < \/N n A n - This 
set of constraint is not empty (e.g. a — 9/16, b = 5/16 fits). 



5 Examples and numerical simulation results 



In this section, we consider examples of diffusions and implement the estimation algo- 
rithm on simulated data for the stochastic volatility model X given by |T}. 



5.1 Simulated paths 

(i) 

We consider the processes for i = 1, . . . , 4 specified by the couples of functions 
bi,a i ,i = l,...,A: 

1. b\(x) — x (^— 9ln(x) + J;C 2 ^ , cr 2 (x) — c 2 x 2 which corresponds to exp(Ut) for Ut an 
Ornstein-Uhlenbeck process, dUt = —OUtdt + cdWt- Whatever the chosen step, Ut 
is exactly simulated as an autoregressive process of order 1. We took 8 = 1 and 
c = 0.75. 

2. b 2 (x) =b (x-2),a%(x) =4{x-2), where 6 (x) ^-(1-z 2 ) \c 2 x + | In ( i±|)] 

and ctq(x) = c(l — x 2 ) are the diffusion coefficients of the process th(Ut) (th(x) = 

— — (2] 

(e x — e x )/ (e + e x ), with the same parameters as for case 1). The process VI 

corresponds to th([/t) + 2 which is a positive bounded process. 

3. 63(3;) = x(bo(ln(x)) + ^<To(ln(a;))) and <J 2 {x) = x 2 tj^(\n(x)) which corresponds to 

the process V t = exp(th(t/()). 

4. 64(2;) = dc 2 /4 — Qx,a\(x) = c 2 x which corresponds to the Cox-Ingersoll-Ross 
process. A discrete time sample is obtained in an exact way by taking the Euclidean 
norm of a d-dimensional Ornstein-Uhlenbeck process with parameters —8/2 and 
c/2. We took d = 9, 8 = 0.75 and c = 1/3. 

We obtain samples of discrete observations of the processes (Vgp)i<e<N' f° r j = 
1, . . . , 4 with 5 1 = 5/10, N'S' = T, from which we generate (^^)i<^<n, by using that 



x es - x u-i)s - \ / v sds si, 
" l(e-i)6 
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k = 150 


k = 200 


k = 250 


k = 300 


k = 500 


b mean 


1,70.10~ 3 


1,87.10~ 3 


1,95.10~ 3 


2, 1.10 -3 


2,91.10 -3 


(std) 


(5, 38.10- 4 ) 


(5, 06.10~ 4 ) 


(4.93.10- 4 ) 


(4, 92.10- 4 ) 


(4, 68.10- 4 ) 


cr 2 mean 


14,8.10" 5 


6, 23.10~ 5 


8, 77.10~ 5 


15,3.10~ 5 


28,6.10~ 5 


(std) 


(3, 26.10~ 5 ) 


(2, 26. 10~ 5 ) 


(3.74.10- 5 ) 


(4,0.10~ 5 ) 


(3,39.10~ 5 ) 



Table 1 Mean squared errors (with standard deviations in parenthesis) for the estimation of 
b and cr 2 , 100 paths of the CIR process, different values of k for the quadratic variation, when 
using the trigonometric basis. 



Process 




V t (2) [T] 


V t {3) [T] 


V W [T] 


V t (4) [GP] 


6 mean 


4,08.10~ 2 


7,51.10- 2 


7.05.10- 2 


1.95.10- 3 


1,04.10~ 3 


(std) 


(6,89.10~ 3 ) 


(8,56.10~ 3 ) 


(8, 12.10~ 3 ) 


(4, 93.10~ 4 ) 


(2, 89.10" 4 ) 


cr 2 mean 


1,42.10" 1 


1,89.10~ 2 


8,32.10~ 2 


8.77.10- 5 


4,61.10" 5 


(std) 


(3.47.10- 2 ) 


(1,54.10~ 3 ) 


(1, 61.10- 2 ) 


(3, 74.10" 5 ) 


(3, 19.10" 5 ) 



Table 2 Mean squared errors (with standard deviations in parenthesis) for the estimation of 
b and a 2 , 100 paths of the processes vj~ l \ i = 1, ...,4 when using the trigonometric basis 
(except the last column, piecewise polynomial basis), k = 250. 



with (ei) i.i.d. jV(0, 1) independent of (V s ,s > 0). Approximations of the integrated 
processes are computed by discrete integration (with a trapeze method). 

The generated vjg}, i = 1, . . . ,4 samples have length N' = 5.10 6 , for a step 5' — 

1000/5. 10 6 = 2.10~ 4 , and the integrated process is computed using 10 data, therefore, 
we obtain n = 5.10 5 and S — 2.10~ 3 , for T — nS — 1000. Different values of k are used, 
but the best value, k = 250, corresponds to A — kS — 0.5 and N = 2000 data for the 
same T. 



5.2 Estimation algorithms and numerical results 

We use the algorithm of Comte and Rozenholc (2004) . The precise calibration of penal- 
ties is difficult and done for the trigonometric basis but also for a general piecewise 
polynomial basis, described in detail in Comte et al (2006). Additive correcting terms 
are involved in the penalty. Such terms avoid under-penalization and are in accordance 
with the fact that the theorems provide lower bounds for the penalty. The correcting 
terms are asymptotically negligible and do not affect the rate of convergence. For the 
trigonometric polynomial collection (denoted by [T]), the drift penalty (i = 1) and the 
diffusion penalty (i — 2) are given by 

( Dm +ln2 ' 5 ( £)m + : )) ' with Dm at most [NA/hi 1 - 5 ^)]. 

For the penalty when considering general piecewise polynomial bases (denoted by 
[GP]), we refer the reader to Comte et al. (2006). The constants k\ and k-i in both 
drift and diffusion penalties have been set equal to 2. The term s\ replaces o\lA for 
the estimation of b and s| replaces a\ for the estimation of a 2 . Let us first explain 
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Fig. 1 Estimation of b (left) and a' 2 (right) for 20 paths of the CIR process with the trigono- 
metric basis (top) and the piecewise polynomial basis (bottom), k = 250. 



how §2 is obtained. We run once the estimation algorithm of a 2 with the basis [T] and 

9 ( 2) 9 

with a preliminary penalty where s 2 is taken equal to 2max m (7„ (o m )). This gives 
a preliminary estimator a 2 . Afterwards, we take §2 equal to twice the 99.5%-quantile 
of a 2 . The use of the quantile is here to avoid extreme values. We get a 2 . We use this 
estimate and set s 2 = maxQ<j.<jv-i(<5" 2 (Vj.))//i for the penalty of b. The results given 




by our algorithm are described in Figure 1 and 2. We plot in Figure 1 the true function 
(thick curve) and 20 estimated functions (thin curves) in the case b and a 2 when using 
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first the basis [T] and then the basis [GP], in the case of the CIR process. We can see 
that the trigonometric basis finds the right slope in the central part of the interval, 
whereas the basis [GP] in general selects only one bin and a straight curve, but with 
a slightly too small slope. The same type of result holds in Figure 2 for the exponen- 
tial Orsntein Uhlenbeck process. For comparison with direct or integrated observations 
of V, we refer to Comte et al. (2006,2007). It is not surprising that in the case of a 
stochastic volatility model, empirical results are less satisfactory and require a large 
number of observations. 

We also give in Tables 1 and 2 results of Monte-Carlo type experiments. In Table 1, 
we show the results of the estimation procedure with the basis [T] and the CIR process 
when choosing different values of k for building the quadratic variation. Clearly, there 
is an optimal value. If k is too large, there are not enough observations left for the 
estimation algorithm. If k is too small, bias phenomena appear, related to the violation 
of the theoretical assumptions (mainly 1/k < A). We repeated the experiment for the 
other processes and obtained analogous results. In general, for this sample size, the 
choice k — 250 seems to be relevant. In Table 2, we can see from the last two columns 
that the basis [GP] seems to be better than [T], at least for the CIR process. The 
errors are computed as the mean over 100 simulated paths of the empirical errors (e.g. 
(WI&[K$)-5(Vi)] 2 far 6). 



6 Discussion on the assumptions and proofs 

6.1 Proof of Proposition 12.11 

We start with some preliminaries. Let It = Jq V s ds. The joint process (Vt, It)t>o i s a 
two dimensional diffusion satisfying: 

f dV t = b(V t )dt + a(Vt)dW t , V = v , 
\dI t = V t dt, I = 

Under regularity assumptions on b and a, this process admits a transition density, 
say qt(vo,io,; v, i) for the conditional density of (Vt,h) given Vq = vq, Iq = *0- This 
density is w.r.t. the Lebesgue measure on (0, +oo) 2 (see Rogers and Williams (2000)). 
We assume that these assumptions hold. 
Now, let us set 

ntS 

J u = / V s ds, i > 1. (23) 

J(l-l)S 

The discrete time process (Vis, Jgs)e>l is strictly stationary and Markov. Its one step 
transition operator is given by the density: 

(v,j) -> qs(vo,0; v,j) ■- qs(vo;v,j)- 

Its stationary density is given by J n(vo)dvoqg(vQ;v, j) := Trg(v,j). 
Let us set, for I > 1, 

z e = Xts — X(e-i)s (24) 
i li 

and define ££ by the relation: Zg = J p ^ eg. Conditionally on (Vt)oo, the random vari- 
ables (r.v.) Zf,£> 1 are independent and has distribution 7V(0, J is)- Consequently, 
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the r.v (e£,£ > 1) are i.i.d. with distribution A/"(0, 1) and the sequence (eg,£ > 1) is 
independent of (Vt)t>o- Hence (Zg)i>i and (V^)i>o are strictly stationary processes. 
From the preliminaries and the above remarks, we deduce that the process {Vis, Jl8> e l)l>l 
is stationary Markov. Its ^-step transition operator is given by: 



f e (v ;dv,dj,du) = q-p (v ;v,j)n(u)dvdjdu 



where q$ (vq; v,j) is the ^-step transition density of (Vis, Jis) an d n(u) is the standard 
gaussian density. The stationary density of (Vis, Jis,£i)i>i ls ' K &( v ij) n ( u )- Hence 

\\QfH v o;dv,dj,du) - n s (v, j)n(u)dvdjdu\\ TV = j \qf\v ,Vj) - n s (v,j))\n(u)dvdjdu 

= J ks^iv^vj) - n s (v,j))\dvdj. 

We may now use the representation of the /3-mixing coefficient of strictly stationary 
Markov processes (see e.g. Genon-Catalot et al. (2000)) to compute 

Pv. e ,J.s,e(.t) = J n 8( v o,jo)n(uo)duodv dj \\Qf\v ;dv,dj,du) ~ n s (v,j)n(u)dvdjdu\\ TV 

Now, we have (5 z (l) < Pv. s ,J. s ,S) = Pv.,,J. t (A < Pv((l-l)S). Finally, 
%« < Pz(ik) < (3 v ((ik - 1)5) < cp v (iA). □ 



6.2 Discussion on the assumptions 

Actually, Assumption [A3] is too strong. We only need the existence of moments up to 
a certain order. Let us now discuss [A5]. Using the representation 



^° = kg J2 J ? s 



i=l 



we see that Vb has a conditional density given (Vt,t > 0). Integrating this density w.r.t. 
the distribution of (Jis,£ = 1, • ■ • ,k), we get that Vq has a density tv* . However the 
formula for n* is untractable. 

On the other hand, we can obtain ([5]) by another approach. We have 

t 2 (V ) = t 2 (V ) + (V - V )(t 2 )'(V ) + \(V - V f f Q (t 2 )"(Vo + u(V - V a ))du. 
Now we use that, for any t £ S m , there exists some constant C such that 



\(t 



2\'l 



< CD 2 m \\t\\ 2 and |j(t 2 )" ||oo < CD 3 m \\tf. 



Noting that |E (V - Wo) ! = 0(A), we get \E[(Vq - V a )(t 2 )' (V Q )]\ < CD 2 m A\\t\\ 2 
0(D 2 n A). On the other hand, 



E 



(V - V f I (t 2 Y (V + u(V - V ))du 



< \\(t 2 y\\ocn(v a -v ) 2 ] 



< cdIahw 2 . 
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It follows that |E(r(Vb) -t (Vb))| < CAD^tf. Next, 

t 2 (Vb) = t 2 (V ) + (V - Vo)(t 2 )'(V ) + (Vb - Vo)[(t 2 Y(V ) - (t 2 )'(Vb)] 
+l&o - V f f (t 2 T(V Q + u(V - V ))du. 

By Gloter's (2007) Proposition 3.1, we have |E[(V - V )\V ]\ < c5(l + V ) c andE[|Vb- 
V \ 2 ] < c/k. Hence 



\E(t 2 (V Q ) - t 2 (V Q ))\ < C\\t\\ 2 {AD 2 m + + ^). 



Since 1/fe < A 



|E(t 2 (Vb)-i 2 (Vb))| <C\\t\\ 2 AD 3 m 



As there exist two positive constants no, 7!"i such that Vi> £ A, ttq < ir(v) < n\, we 
obtain 

(ttq - CziO^llill 2 < ll^ll 2 * < (ti + CAVl)\\t\\ 2 . 
Under the constraint that AT>\ = o(l), we get ([5| for n large enough. This constraint 
is compatible with the other ones, see the discussion after Theorem 14. II 



6.3 Definition of the residuals and their properties 
We have 



i?« (i + 1) = b{Vi) - b{Vi) + Ri 1 ' ((i + 1)Z\) 

le re 

and defined by 

+ 1)4) = b(V (l+1)A ) - fe(^) + ^ i>(i+i)A(s)(b(V s ) - b(V (t+1)A ))ds 

On the other hand, 



where is the residual term for b studied in Comte et al. (2006, Proposition 3.1) 
<{i + 1)A) = b(V (l+1)A ) - bfYi) + 4o 



d(2)/- i t\ 3 i u i+l,k ~ u i,k) 2 r 2/ T/ 2,-,7-ni . 7,(2)^/. . , s ..n 

^ ; (i + 1) = + [a (V {i+1)A -a (Vi)] + Rl >((i + l)A) 

(2) 9 

where 7i* is the residual term for a studied in Comte et al. (2006, Propositions 4.1, 
4.2 and 4.3) defined by R {2) = £m=l R^' m) with 

R {2 ^\iA) = ^3 ( f it+2)A AA(s)b(V s )ds 



%A 

(i+2)A \ ( f (i+2)A 



R {2 ' 2 \iA) = A | y^ ^(«)(6(^)-6(Vi4))d«J ^ , V ji«)T(r ( ,)r/ir„ 

(i+2)A / r {i+2)A 



ip 2 A (u)du I r b ^ (T (V r s )ds, 



where Tj, CT = (er 2 /2)(o- 2 )" + &(<r 2 )'. This decomposition is obtained by applying Ito's 
formula and Fubini's theorem. 

We may now summarize the following useful results, proved in Comte et al. (2006, 
Propositions 3.1, 4.1, 4.2 and 4.3): 
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Lemma 6.1 Under Assumptions [A1]-[A2]-[A3], 

1. Fort = 1,2, form = 1,2, for alii, E{[R,{ e) (iA)] 2m } < cA 2ml where c ts a constant. 

2. LetZ^ii) = (l/A 2 ) l { i+ 2)A iP lA (s)o-(V s )dWs. For alii, E([Z< 1} (i)] 2 ) < (2/3A)E(o 2 (V a )). 

3. For alii, E([zj 2,1) ] 2 ) < Cl E(a 4 {V )) and E([Z.f ,2) ] 2 ) < c 2 o\A. 
We also need the following result: 

Lemma 6.2 Under assumptions [A1]-[A3], for any integer i, E[(t^ — V";) 2 ] = E(u 2 k ) < 
2E(V u 2 )/fc and E[(V t - V t ) 4 } = E(^ fc ) < 56E(V 4 )//t 2 . 

Proof of Lemma [6T2J This follows from Proposition 3.1 p. 504 in Gloter (2007). □ 



6.4 Proof of Propositions 13 . 1 1 and 13.2 



For sake of brevity, we give both proofs at the same time. The main difference lies in 
the orders of the expectations and in the appearance of a specific term in the study of 
the estimator of a 2 . Let us thus define R^J for I = 1, 2 as r{]) = R^ and 

Ri 2 J{i + 1) = R [2) (i + 1) - [o- 2 (V (t+1)A - a 2 ^)}. 



Moreover let T$p(t) = and 



N-l 

1 , 2, 



w^) = ± 52 ^\y^+DA - ^mw). 

i=0 

Let us consider the set 





- 

to 


L/ 


IIHI7V 1 


Hill 2 * 



(s m + s m -)/{o} 

On O n , \\t\\n* < V2\\t\\ N . From (JTTJ) , we deduce 

\\Jm - J A \\N S \\Jm - J A \\N + o ll/m - 7m Htt* + sup ^ 

8 tes m ,p||„.=i 

+ 16 sup [T^\t)\ 2 
tes m ,\\t\\ w ,=i 

8 ^r,(/) ( 



(25) 



+gll^-^llSr + ^E^( i + D] S 



< llfW f( £ )|| 2 1 3 ||#W f( £ )|| 2 4. If! 
S ||/m - J A IliV+gll/m -/mllw + 16 



AT-1 



+±J sup [t#> (t)] 2 + A 52 [R** (* + 1)] 



sup 1 

tes m ,||t|| w «=i 

2 



(*) 



'0 *es„ 



ll=i 



In the last line above, we use the lower bound 7Tq introduced in [A5]. 

Setting B m (0,l) = {t G S m ,\\t\\ = 1} and B m * (0, 1) = {t G S m , PHtt* = 1}, the 

following holds on the set f2j^: 



tes^*(o,i) 



16 

"o teB m (o,i) 



N-l 



We have the following result: 
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e[ sup \^ > nt)\<K Jm= - v 



Lemma 6.3 Under assumptions [A1]-[A3] and [A5], ifl/k < A, we have, fori — 1,2 

teB^ 1 (o,i) 

with Ci = E(<7 2£ (Vo)). 

The Lipschitz condition on 6 and Lemma 16.21 imply that 

E[(6(Vi) - 6(^)) 2 ] < c b M[(Vi - P t f] < 2c b E(V$)/k. 
Consequently, there exists a constant c such that 

N-l 



Thus 

H\\i>rn-b A f N ln N )<7\\b m -b\\l* + ^K[ sup [i/J^t)] 2 ) +c"(A + k- 1 ). 
By gathering all bounds, we find 

E(||6 m - b\\%ln N ) < 7\\b m - b\\l. + K n°\V Q ))D m {l + _L_ } + R , {A + 
On the other hand, Lemma |6. II and Lemma |6 . 21 imply that 



JV-l 



V 7V 

i=0 



1 V" / [dP)/ ; , im2 , 9 K+l,fc -Mj.fc) 



AT-1 / 

£ [^ 2 )( l+ l)]2 +4 
i=0 \ 



< 2czA 2 + ^E(uf fc ) < C(Z\ 2 + . 1 



Next we need to bound E ^sup teSm ||t||=l Pjv^ W] 2 ) • This is obtained in the fol- 
lowing Lemma: 



Lemma 6.4 Under the Assumptions of Provosition \3.Si and if 1/k < A, there exists 
a constant C such that 

sup \T {2 \t)] 2 ) < C{D 2 m A 2 + A 3 + Dljk 2 + D m /(Nk)). 
tes m ,||t||=i / 

We can use Lemma 6.1 in Comte et al. (2005) to obtain that, if V n < C'VNA/ ln(iV), 
then 

This enables to check that E(||/^ — IIatIj? ) — C /N using the same lines as the 
analogous proof given p. 532 in Comte et al. (2007). For this reason, details are omitted. 

□ 
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6.5 Proof of Lemma 16. 3 



Case 1 = 1. Next, let us define Tt = <j((W s , B s ), < s < t,rj). We can use martingale 
properties to see that, V£ £ S m , 



E(t$)z2?i) = E(E( i (V- I )^ ( + ) 1 l^+i)^)) = m&^l+i^d+DA)) = 

because the last conditional expectation is zero. Moreover, the same tool shows that 
the covariance term ^{t{Vi)t{V()Z^ 1 Z^ l ) for I > i + 2 is also null by inserting a 
conditional expectation given Ffe+ViA- Consequently, it is now easy to see that 



E 



sup [ V $> (t)f ) < e nvN&j)] < E Var 



aes m A\t\\=i 



3 = 1 



3 = 1 



N-l 



E mv^i 



N 



i=0 



3 = 1 



N 



^ 2 2 2 2 2 

Now, Lemma 16.21 implies that E[(uj-|_2,fc — u i+l.k) I A = E[(itj_|_ 2 k + fe)/-^ — 
c/(kA 2 ). Then, applying also Lemma l6.ll (ii). it follows that, with 



Case ^ = 2. Next, for the martingale terms, we write 



E( sup \4\t)] 2 ) < -Lk( sup [4 2) W] 2 )<^E E (^ 2) (^) 



tes,^ (o,i) 



r o tefl m (o,i) 



i=i 



i=l 



jV-1 



E E 



'0 , = 



i=i 



i=0 



i=0 



V i=0 



iV-1 



Both terms are bounded separately. For the first one, we use that, for r — 1, 2 



oov( W (^)^2.i ) ,Vj(Vi)^i+i ) ) = 
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if £ > i + 2, by inserting a conditional expectation with respect to 3~(g.+l)A- Now, for 



N-l 



j=i \o<i,e<N-i 



E E 

i - f^ -1 



< ^2 E E I E ^o^Wtoi+'I 



9 On 



< ^llE^ 2 H- E [(4 2 ' r) ) 2 ] <2^[5lE(a 4 (Vb)) + £ 2 zl] 



by using Lemma 16.11 

For the second part, let us define the filtration generated by B and the whole path 
of V, i.e. 



QY = a(V s ,s £ R + ,B s ,s< t) = <j(W s , s 6 R + ,B s ,s < t, 77). 

Now we observe that 

E(t$)(? i+a - K+i)w i+lijfc ) = E [E(t(^)W+2 - %iK+i,fc)l^ +1)4 )] 

= E [t(^)m +2 - %i)E( Ui+1 , fc )|e ( ^ +1)4 )] 



as E(uj + ij c )|C/^ +1 -) Z i) = 0. Moreover for any £ > i, 

E(t(Vi)(V5 + 2 - V<+i)u i+ i,fct(Vi)(Vt + a - Vi+i)^+i,fc)) = 

by inserting a conditional expectation with respect to GYi+i\/\- The last remark is that 
one can easilty see that 



E[(VJ+i - V t f] < ^E 



(i+2)A 



(V s - V s _ A )ds 



< CA^ 



Now we have 



D m / N-l \* D m N-l 

E E hvz E ^(te+ 2 - ^H +1 , fc = E E E - Vifu 2 i+1 

j=l \ i=0 J j=l i=0 



< C 



£mj_ 

N kA' 



The second part of this term can be treated in the same way, and it follows that if 
1/k < A, then this term is less than C'D m /N. □ 
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6.6 Proof of Lemma 16.41 

Let us recall that we know from Comte et al. (2006) that 

N-l 

= w E ^ 2{ y^+DA - a\Vi))t<yi) 



is such that 



N 
»=o 



S( sup [T^(t)] 2 ) < C(D 2 „zA 2 + D^Z\ 3 ) 



t£fl m (0,l) 



Here, we write that T (2) (t) = T^ A) (t) + T {2 ' 2) (t) + T$' 3 \t) + T%(t) with 

N-l N-l 

T ^ 2,1) W = Jf E W^)-*(^)]k 2 (^)-^(^)], T#' 2 >(t) = i ]T f(^)[a 2 (V^)-a 2 (^)], 
i=0 i=0 



T N 3) (t) = ll2 lt(Vi)-W)][°\Vi)-° 2 (v {i+1)A )]. 



i=0 

We shall use the following decompositions obtained by the Taylor formula: 

a 2 (?i) - <r 2 (V<) = & - Vi)(ff 2 )'(V0 + RiAVi) - t(v$) = $ - W^i) + Silt) 

withE(i? 2 ) < C/k 2 andE(i? l 4 ) < C/k 4 if (a 2 )" is bounded, and E (sup tGBm(0-1) Si(t) 2 \ < 

CDljk 2 , E 1 / 2 (sup ieBm(0il) Si{tf) < CD^/k 2 because ||t»||i, < CDf n \\tf. Now, 
the three terms can be studied as follows. First 

N-l N-l 

r i?' 1) (*) = F E - ^) 2 (*')m)(^ 2 )'(^) + ^ E - 

i=0 i=0 
N-l N-l 

+n E (Vi - ViK^Ymm + - J2 Km 

and we bound each term successively. Clearly by Schwarz inequality applied to each 
term, we find, 

E( sup [Tp l ' 1 \t)?)<CEV 2 {V?)^- 
using that \\t'\\lo < CL»m||i|| 2 , 

E( sup [T (2 ^ 2 \t)] 2 )<C^, E( sup [T$ 1 '*\t)] 2 )<C^ 2 (V?)%, 

t£B m (0,l) K t€fl m (0,l) K 



and 

fc 4 



tes m (o,i) 

Therefore, if 1/k < A, E(sup teBm(0il) [T {2A) (t)] 2 ) < C{D i m /k 2 + D 5 m /k 3 ). 
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Next, we write that 

JV-1 JV-1 

T n 2) (*) = E - ^) + iv E *(^)^ 

= TP' a - 1 )(t) + T^A 9 )(t). 

We obtain easily that 

1 N 

E( sup [T^ 2 ' 2 ' 2 ^)] 2 ) <E( sup PHooTf E ^i) — ^oAnE(.R 2 ) < CD m /k 2 , 
l£B m (0,l) i£B m (0,l) 7V i=1 

a term which is negligible with respect to the previous ones. 

Then (V^ — Vi)ip(Vi) is a martingale increment with respect to the filtration (GY): 
for any measurable function ip. In particular, 

n&i ~ Vj)tf(Vi)] = E[E[(V- - Pi)i&(Vj)|0&]] 

= E^(Vi)E[(Vj - VJ)|0&]] = 

since E(V r j|^^ i ) = t/j. In the same way, for i < £, 

e (($ - PiMvsX^ - WTO) = o 

by inserting a conditional expectation with respect to C? • Therefore 

e( sup [r^ 2 ' 2 ' 1 )^)] 2 ) < E E E wm)(^ 2 )'(^)(vi - Vi) 

t€B m (0,l) j=1 l= / 

= E^ E (^(^)(^)'(Vi)(Vi-Vi) 
i=i 



^ ^ E ^(E^^i))^ 2 )'^)] 2 ^ - ^i) 2 J 

< ^E 1 /2[( <7 2 ) ' ( y l) 4 ]E l/2 [u 4 fc] < A*. 

For the last term, we write T^' 3) (t) = T^' 3 '^ (t) +T^' 3,2 \t) where 

JV-1 

T# ,3,1) (t) = (ViV) £ - Vi)t'(Vi)(«T 2 (Vi) - °\V (t+1)A )), 

i=0 

JV-1 

T# ,3>2) (t) = (ViV) E ^W(a 2 W) -<r 2 (V (i+ i )4 )). 
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Moreover, we know from Comte et al. (2006) that E[(a 2 (Vi) - o- 2 {V (i+1)A )) 2 } < 
E 1/2 [(a 2 (V t )-a 2 (V (l+1)A ))' i } < CA. Now, for T {2 ' 3tl) {t), we proceed as for T {2 ' 2 - 1) (t) 
since both have the same martingale property w.r.t. QY ■ We get 

N-l v 2 



D m ( N-l \ 

e( sup pf' 3 - 1 ^)] 2 ) < E E E ^(vm-v^iv^-a 2 ^^))) 
^ jf E E ((^) 2 (^i)(^i - PoVcft) - A*^)) 



3=1 

I 

3=1 

< ^ E V2 (u } )fc)E Va [(ff2( v l) _ ^(v^)) 4 ] 
- IvT 

asEj(^) 2 W < CDl Using D„ < iVzi and l/k < A implies E(sup teSm(04) [T^' 3 ' 1] (t)] 2 ) < 
CD m A 3 . On the other hand, E(sup teBm(01) [T (2 ' 3 ' 2) (t)] 2 ) < CD^A/k 2 < CD^A 3 , 
as l/k < A. 

By gathering and comparing all terms and assuming that l/k < A, we obtain the 
bound given in Lemma l6.4I D 



6.7 Proof of Theorem O 

The proof of this theorem relies on the following Bernstein-type Inequality: 

Lemma 6.5 Under the assumptions of Theorem \4- 1\ for any positive numbers t and 
v, we have 



52 t(Vi)zHl >Ne,\\t\\ 2 N <v 2 



(i+1) 
i=0 



< exp 



NAe 2 
2a 2 v 2 



Proof of Lemma [675) Noting that W is a Brownian motion with respect to the 
augmented nitration T s = a((B u , W u ),u < s,r/), the proof is obtained as the analogous 
proof in Comte et al. (2007), Lemma 2 p. 533. □ 

Now we turn to the proof of Theorem 14.11 
As in the proof of Proposition [3~T1 we have to split ||6 — 6^4 II jv = 11^ ~ ^aII /vlfijv + — 
^aWn^-O" ■ F° r the study on fi%, the end of the proof of Proposition I3.1l can be used. 

Now, we focus on what happens on J2/v- From the definition of b, we have, Vm G 
Mn, 7Af(&m) +pen(m) < 7jv(&m) +pen(m). We proceed as in the proof of Proposition 
13.11 with some additional penalty terms and obtain 

®(\\bm ~ b A \\Nln N ) < 7\\b m -b A \\l« +pen(m)+32E ( sup (t)] 2 l n , 



tes m +Srh,\\t\\, 



-E(pen(?n)) + 32c' A 



The difficulty here is to control the supremum of (i) on a random ball (which 
depends on the random m). This is done by setting i/jy = ^13 + ^jy j with 



JV-l N-l 
i=0 i=0 
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We use the martingale property of vjy'^\t) and a rough bound for Vj^' 2 \t) as follows. 
For , we simply write, as previously 



sup [4 1,2) (*)f| < A E SU P t^ 1,2) (*)] 2 

tes m +s A , ||t||„»=i / \t65„,||t||=i 

2>n 



42?r, 9 42X, 4 1 



ir$N L "" w J 1 ~ 7r*Nk n A 2 ~ TV* k n A- 



For ^S' 1 "', let us denote by 



Gm(m') = sup 

tes m +s m /,||t|| x »=i 

the quantity to be studied. Introducing a function p(m,m'), we first write 

GmWlft < [{Gm(m) -p(m,m))ln N ]++p(m,rh) 

< [( G "i( m ') -P( m . m '))lf2 N ]+ +p(m,m). 

Then pen is chosen such that S2p(m, m ) < pen(m) +pen(m'). More precisely, the next 
Proposition determines the choice of p(m, m') which in turn will fix the penalty. 

Proposition 6.1 Under the assumptions of Theorem \4- 1\ there exists a numerical 
constant k\ such that, for p(m,m!) = K\o\{D m + D m r)/(nA), we have 

E[(Gm(m') -p(m,m'))l(] N ]+ < ca\ e 



NA 



Proof of Proposition 16.11 The result of Proposition 16.11 follows from the inequality 
of Lemma 16.51 by the L 2 -chaining technique used in Baraud et al. (2001b) (see Section 
7 p.44-47, Lemma 7.1, with s 2 = o\/A). □ 

It is easy to see that the result of Theorem 14.11 follows from Proposition 16.11 with 
pen(m) = Ka\Dm/{NA). □ 



6.8 Proof of Theorem I4~2l 

The lines of the proof are the same as the ones of Theorem 14. II Moreover, they follow 
closely the analogous proof of Theorem 2 p. 524 in Comte et al. (2007), see also Comte 
et al. (2006). Therefore, we omit it. 
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